Your lab/homework must be submitted in Moodle with two files: (1) R Markdown format file (Rmd); (2) an html file. Other formats will not be accepted. Your responses must be supported by both textual explanations and the code you generate to produce your result.
Part 1
Your first task is to recreate the animated gapminder plot for the full set of years.
First, use the following to read the three following data into R:
gdp_per_cap <-
read.csv(
"income_per_person_gdppercapita_ppp_inflation_adjusted.csv",
header = TRUE,
stringsAsFactors = FALSE,
check.names = FALSE
)
life_exp <-
read.csv(
"life_expectancy_years.csv",
header = TRUE,
stringsAsFactors = FALSE,
check.names = FALSE
)
pop <-
read.csv(
"population_total.csv",
header = TRUE,
stringsAsFactors = FALSE,
check.names = FALSE
)
gapminder dataset that is provided in the gapminder package:data(gapminder, package = "gapminder")
head(gapminder)
## country continent year lifeExp pop gdpPercap
## 1 Afghanistan Asia 1952 28.801 8425333 779.4453
## 2 Afghanistan Asia 1957 30.332 9240934 820.8530
## 3 Afghanistan Asia 1962 31.997 10267083 853.1007
## 4 Afghanistan Asia 1967 34.020 11537966 836.1971
## 5 Afghanistan Asia 1972 36.088 13079460 739.9811
## 6 Afghanistan Asia 1977 38.438 14880372 786.1134
Use the code from the introductory presentation to generate the animated plot for the entire period.
Hints: i. Make sure to define your variables in the correct data types (strings, characters, numeric etc.). ii. Consider using the countrycode::countrycode (that is, the countrycode function from the countrycode package)
A solution:
library(tidyverse)
## ── Attaching packages ────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.0.5 ✓ dplyr 1.0.3
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.4.0
## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
gdp_per_cap_long <-
gdp_per_cap %>%
pivot_longer(setdiff(colnames(gdp_per_cap), "country"), names_to = "Year", values_to = "GDP_per_cap")
life_exp_long <-
life_exp %>%
pivot_longer(setdiff(colnames(life_exp), "country"), names_to = "Year", values_to = "Life_Expectancy")
pop_long <-
pop %>%
pivot_longer(setdiff(colnames(life_exp), "country"), names_to = "Year", values_to = "Population")
library(countrycode)
gdp_life_pop <-
left_join(gdp_per_cap_long, life_exp_long, by = c("country", "Year")) %>%
left_join(pop_long, by = c("country", "Year")) %>%
drop_na() %>%
mutate(Continent = countrycode(sourcevar = country, origin = "country.name", destination = "continent")) %>%
mutate(Year = as.numeric(Year)) %>%
filter(Year <= 2020)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
gg <- ggplot(gdp_life_pop, aes(GDP_per_cap, Life_Expectancy, color = Continent)) +
geom_point(aes(size = Population, frame = Year, ids = country)) +
scale_x_log10()
## Warning: Ignoring unknown aesthetics: frame, ids
ggplotly(gg) %>% animation_opts(frame = 100)
(life expectancy for Cuba in 1919) / (life expectancy for the US in 1919), (life expectancy for Cuba in 1920) / (life expectancy for the US in 1920), (life expectancy for Iraq in 1919) / (life expectancy for the US in 1919), (life expectancy for Iraq in 1920) / (life expectancy for the US in 1920), etc.Hint: if you do this correctly, the point that represents the US should always have a y-value of 1.
A solution:
# Relative to the US:
gdp_life_pop <-
gdp_life_pop %>%
group_by(Year) %>%
mutate(relUS = Life_Expectancy / Life_Expectancy[country == "United States"])
gg <- ggplot(gdp_life_pop, aes(GDP_per_cap, relUS, color = Continent)) +
geom_point(aes(size = Population, frame = Year, ids = country)) +
geom_point(data = gdp_life_pop %>% filter(country == "United States"), aes(size = Population, frame = Year, ids = country), shape = 1, col = "black") +
scale_x_log10()
## Warning: Ignoring unknown aesthetics: frame, ids
## Warning: Ignoring unknown aesthetics: frame, ids
ggplotly(gg) %>% animation_opts(frame = 100)